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Abstract 

The analysis of randomized search heuristics on classes of functions is fun- 
damental for the understanding of the underlying stochastic process and the 
development of suitable proof techniques. Recently, remarkable progress has 
been made in bounding the expected optimization time of the simple (1+1) EA 
on the class of linear functions. We improve the best known bound in this set- 
ting from (1.39 + o(l))enlnn to en Inn + 0{n) in expectation and with high 
probability, which is tight up to lower-order terms. Moreover, upper and lower 
bounds for arbitrary mutations probabilities p are derived, which imply ex- 
pected polynomial optimization time as long as p = O ( (In n)/n) and which are 
tight if p = c/n for a constant c. As a consequence, the standard mutation 
probability p = 1/n is optimal for all linear functions, and the (1+1) EA is 
found to be an optimal mutation-based algorithm. The proofs are based on 
adaptive drift functions and the recent multiplicative drift theorem. 

1 Introduction 



x 



The rigorous runtime analysis of randomized search heuristics, in particular of evo- 
lutionary computation, is a growing research area where many results h ave been ob- 



taine d in recent years. This line of research started off in the early 1990 's (iMiihlenbein 



1992) with the consideration of very simple evolutionary algorithms such as the well- 



known (1+1) EA on very simple example functions such as the well-known OneMax 
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bj) and important tools for the analysis were developed. Nowadays the state 



of the art in the field allows for the analysi s of different types of searc h heuristics on 
problems from combinatorial optimization (INeumann and Wittl . |2010| ). 
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Recently, the analysis of evolutionary algorithms on linear pseudo-boolean func- 
tions has experienced a great renaissance. The first pr oof that the (1+1) EA opti- 
mizes any linear function in expected time 0(n logn) by lDroste. Jansen and Wegener 
(120021 ) was highl y techn i cal sin ce it did not yet explicitly use the analytic framework 
of drift analysis (IHajekl . Il982l). which a llowed for a considerably simplified proof of 
the O(nlogn) bound, see iHe and Yaol (12004 ) for the first complete proof using the 
method^] Another major improvement was made by Jagerskiipper (12008 ). who for 
the first time stated bounds on the i mplicit constant hidden in the O ( nlogn ) term. 
This constant was finally improved by iDoerr. Johannsen. and Winzenl (l2010a[ ) to the 
bound (1. 39 + o(T))enlnn using a clean framew ork for the analysis of multiplica- 
tive drift ( Doerr. Johannsen. and Winzen . 2010bl ). The best known lower bound for 
ge neral linear functions with non-zero wei ghts is en Inn — 0(n) and was also proven 
bv lDoerr. Johannsen and Winzenl (2010a). b uilding upo n the case of the OneMax 
function analyzed by Doerr. Fouz. and Witt ( 2010 . 201 lh . 

The standard (1+1) EA flips each bit with probability p = 1/n but also different 
values for the mutatio n probability p have been s tudied in the literature. Recently, 
it has been proved bv lDoerr and Goldberg! (120111 ) that the 0(nlogn) bound on the 
expected optimization time of the (1+1) EA still holds (also with high probability) 
if p = c/n for an arbitrary constant c. This result uses the multiplicative drift 
framework mentioned above and a drift function being cleverly tailored towards the 
particular linear function. However, the analysis is also highly technical and does not 
yield explicit constants in the O-term. For p = u(l/n), no runtime analyses were 
known so far. 

In this paper, we prove that the (1+1) EA optimizes all linear functions in ex- 
pected time en In n + O(n), thereby closing the gap between the upper and the lower 
bound up to terms of lower order. Moreover, we show a general upper bound depend- 
ing on the mutation probability p, which implies that the expected optimization time 
is polynomial as long as p = 0((\nn)/n) (and p = f2(l/poly(n))). Since the expected 
optimization time is proved to be superpolynomial for p = u((lnn)/n), this implies 
a phase transition in the regime 0((lnn)/ra). If the mutation probability is c/n for 
some constant c, the expected optimization time is proved to be (1 ± o(l))^nlnn. 
Altogether, we obtain that the standard choice p = 1/n of the mutation probability 
is optimal for all linear functions. This is remarkable since this seems to be the choice 
that is most often recommended by practitioners in evolutionary computation (jBackj, 
19931 ). In fact, the lower bounds hold for the large class of so-called mutation-based 
EAs, in which the (1+1) EA with p = 1/n is found to be an optimal algorithm. 

The proofs of the upper bounds use the recent multiplicative drift theorem and 
a drift function that is adapted towards both the linear function and the mutation 



1 Note, however, that not the original (1+1) EA but a variant rejecting offspring of equal fitness 
is studied in that paper. 
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probability. As a consequence from our main result, we obtain the results bv Doerr 
and Goldberg (120111 ) with less effort and explicit constants in front of the n In n-term. 
All these bounds hold also with high probability, which follows from the rece nt tail 
bounds added to the multiplicative drift theorem by lDoerr and Goldberg! (120111 ). The 
lower bounds are based on a new multiplicative drift theorem for lower bounds. 

This paper is structured as follows. Section [2] sets up definitions, notations and 
other preliminaries. Section |3]summarizes and explains the main results. In SectionsH] 
and El respectively, we prove an upper bound for general mutation probabilities and 
a refined result for p = 1/n. Lower bounds are shown in Section [61 We finish with 
some conclusions. 



2 Preliminaries 

The (1+1) EA is a basic search heuristic for the optimization of pseudo-boolean 
functions /: {0, 1}™ — > R. It reflects the typical behavior of more complicated evolu- 
tionary algorithms, serves as basis for the study of more complex approaches and is 
therefore intensivel y investigated in the theory of ra ndomi zed search heuristics ( Auger 
and Doerr. l201lf T For the case of minimization, it is defined as Algorithm [TJ 

Algorithm 1 (1 + 1) EA 

t := 0. 

choose uniformly at random an initial bit string Xq G {0, l} n . 
repeat 

create x' by flipping each bit in x% independently with prob. p (mutation). 
x t+ i := x' if f(x') < f(x t ), and x t+ i := x t otherwise (selection). 
t:=t+l. 
until forever. 



The (1+1) EA can be considered a simple hill-climber where search points are 
drawn from a stochastic neighborhood based on the mutation operator. The pa- 
rameter p, where < p < 1, is often chosen as 1/n, which then is called standard 
mutation probability. We call a mutation from x t to x' accepted if f(x') < f(x t ), i.e., 
if the new search point is taken over; otherwise we call it rejected. In our theoretical 
studies, we ignore the fact that the algorithm in practice will be stopped at some 
time. The runtime (synonymously, optimization time) of the (1+1) EA is defined 
as the first random point in time t such that the search point x t has optimal, i.e., 
minimum /-value. This corresponds to the number of /-evaluations until reaching 
the optimum. In many cases, one is aiming for results on the expected optimization 
time. Here, we also prove results that hold with high probability (w.h.p.), which 
means probability 1 — o(l). 
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The (1+1) EA is also an instant iation of the algorithmic scheme that is called 
mutation-based EA by ISudholtl ( 120101 ) and is displayed as Algorithm |2j It is a general 
population-based approach that includes many variants of evolutionary algorithms 
with parent and offspring populations as well as parallel evolutionary algorithms. 
Any mechanism for managing the populations, which are multisets, is allowed as long 
as the mutation operator is the only variation operator and follows the independent 
bit-flip property with probability < p < 1/2. Again the smallest t such that Xt is 
optimal defines the runtime. Sudholt has proved for p — 1/n that no mutation-based 
EA can locate a unique optimum faster than the (1+1) EA can optimize OneMax. 
We will see that the (1+1) EA is the best mutation-based EA on a broad class of 
functions, also for different mutation probabilities. 



Algorithm 2 Scheme of a mutation-based EA 
for t := — > ix — 1 do 

create x t G {0, l} n uniformly at random, 
end for 
repeat 

select a parent x G {xo, . . . , x t } according to t and f(xo), . . . , f(x t ). 
create Xt+i by flipping each bit in x independently with probability p < 1/2. 
t:=t + l. 
until forever. 



Throughout this paper, we are concerned with linear pseudo-boolean functions. 
A function /: {0, 1}™ — > R is called linear if it can be written as f(x n , . . . ,xi) = 
w n x n + • • • + W\Xi + Wq. As common in the analysis of the (1+1) EA, we assume 
w. 1. o. g. that w = and w n > ■ ■ ■ > Wi > hold. Search points are read from x n 
down to x\ such that x n , the most significant bit, is said to be on the left-hand side and 
xi, the least significant bit, on the right-hand side. Since it fits the proof techniques 
more naturally, we assume also w. l.o. g. that the (1+1) EA (or, more generally, the 
mutation-based EA at hand) is minimizing /, implying that the all-zeros string is 
the optimum. Our assumptions do not lose generality since we can permute bits and 
negate the weights of a linear function without affecting the stochastic behavior of 
the (1+1) EA/mutation-based EA. 

The probably most intensively studied linear function is ONEMAx(i n , . . . ,x±) = 
x n + ■ ■ ■ + X\, occasionally also called the CountingOnes problem (which would be 
the more appropriate name here since we will be minimizing the function). In this 
paper, we will see that on the one hand, OneMax is not only the easiest linear func- 
tion definition-wise but also in terms of expected optimization time. On the other 
hand, the upper bounds obtained for OneMax hold for every linear function up to 
lower-order terms. Hence, surprisingly the (1+1) EA is basically as efficient on an 
arbitrary linear function as it is on OneMax. This underlines the robustness of the 
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randomized search heuristic and, in retrospect and for the future, is a strong moti- 
vation to investigate the behavior of randomized search heuristics on the OneMax 
problem thoroughly. 

Our proofs of the forthco ming upper bounds use the multiplicativ e drift theorem 
in its mo st recent version fcf. iDoerr. Johannsen and Winzenl. 120 10b and_Doerr and 



Goldberg. 1201 it ). The key idea of multiplicative drift is to identify a time-independent 
relative progress called drift. 

Theorem 1 (Multiplicative Drift, Upper Bound). Let S C R be a finite set of 
positive numbers with minimum 1. Let {X'''} ( > be a sequence of random variables 
over S U {0}. Let T be the random first point in time t > for which X" = 0. 
Suppose that there exists a 5 > such that 

E(X® - | X® = s) > 5s 

for allse S with Prob(X W = s) > 0. Then for all s G S with Prob(X(°) = s ) > 0, 



E(T | X (0) 



so) 



< 



hi(s ) + 1 
5 



-t 



Moreover, it holds that Prob(T > (ln(s ) + t)/5)) < e 

As an easy example application, consider the (1+1) EA on OneMax and let 
denote the number of one-bits at time t. As worse search points are not accepted, 
is non-mcreasmg over time. We obtain E(X® - X^ \ X® = s) > s(l/n)(l - 
l/n)" _1 > s/(en), in other words a multiplicative drift of at least 5 = l/(en), since 
there are s disjoint single-bit flips that decrease the A- value by 1. Theorem [T] applied 
with 5 = I /(en) and ln(A^) < In n gives us the upper bound en(lnn + 1) on the 
expected o ptimization time, which is the sa me as the classical method of fitness-based 
partitions flWegeneri EpOll: Isudholtl l2010h or coupon collector arguments f Motwani 
and Raghavan, Il995l ) would yield. 

On a general linear function, it is not necessarily a good choice to let X® count 
the current number of one-bits. Consider, for example, the well-known function 
BinVal(x„, . . . , %i) = J2i=i^ l ~ lx i- The (1+1) EA might replace the search point 
(1,0,..., 0) by the better search point (0, 1, . . . , 1), amounting to a loss of n — 2 
zero-bits. More generally, replacing (1,0,..., 0) by a better search point is equivalent 
to flipping the leftmost one-bit. In such a step, an expected number of (n — l)p 
zero-bits flip, which decreases the expected number of zero-bits by only 1 — (n — l)p. 
The latter expectation (the so-called additive drift) is only 1/n for the standard 
mutation probability p = 1/n and might be negative for larger p. Therefore, X® is 
typically defined as X® := g^x®), where x® is the current search point at time t 

and q(x„. Xi) is another lin e ar func tion called drift function or potential function. 

Doerr. Johannsen and Winzenl (l2010bl ) use x\ H — • + x n / 2 + (5/4)(x„/ 2 +i + • ■ ■ + x n ) 
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as potential function in their application of the multiplicative drift theorem. This 
leads to a good lower bound on the multiplicative drift on the one hand and a small 
maximum value of X^> on the other hand. In our proofs of upper bounds in the 
Sections 0] and El it is crucial to define appropriate potential functions. 

For the lower bounds in Section we need the following variant of the multiplica- 
tive drift theorem. 

Theorem 2 (Multiplicative Drift, Lower Bound). Let SCftfe a finite set of positive 
numbers with minimum 1. Let {X^} t > be a sequence of random variables over S, 
where X {t+l ^ < X® for any t > 0, and let s m - m > 0. Let T be the random first point 
in time t > for which X® < s m i n . If there exist positive reals [3,5 < I such that for 
all s > s m ; n and all t > with Probpf^ = s) > it holds that 



Compared to the upper bound, the lower-bound version includes a condition on 
the maximum stepwise progress and requires non- increasing sequences. As a tech- 
nical detail, the theorem allows for a positive target s m i n , which is required in our 
applications. 



We now list the main consequences from the lower bounds and upper bounds that 
we will prove in the following sections. 

Theorem 3. On any linear function, the following holds for the expected optimization 
time E{T p ) of the (1+1) EA with mutation probability p . 

1. If p = u((lnn)/n) or p = o(l/poly(n)) then E(T P ) is superpolynomial. 

2. If p = f2(l/poly(n)) and p = 0((lnn)/n) then E(T P ) is polynomial. 

3. If p = c/n for a constant c then E(T p ) = (1 ± o(l))— nlnn. 

4- E(T p ) is minimized for mutation probability p = 1/n if n is large enough. 

5. No mutation-based EA has an expected optimization time that is smaller than 
E(Ty n ) (up to lower-order terms). 



1. £(IW-I< W ) | JW = s) < 

2. Prob(XW - X< m ) > (5s \ X® 
then for all Sq E S with Prob(X^ = 



= s) < f36/\ns, 

s ) > 0, 

hipp) - ln(> m i n ) 1-/3 
5 'l + 0' 



E(T\X^ = s ) 



3 Summary of Main Results 
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In fact, our forthcoming analyses are more precise; in particular, we do not state 
available tails on the upper bounds above and leave them in the more general, but also 
more complicated Theorem H] in Section HI The first statement of our summarizing 
Theorem [3] follows from the Theorems [8] and [9] in Section [6j The second statement 
is proven in Corollary [2J which follows from the already mentioned Theorem |H The 
third statement takes together the Corollaries [T] and [3J Since e c / c is minimized for 
c = 1, the fourth statement follows from the third one in conjunction with Corollary |3j 
The fifth statement is also contained in the Theorems [7] and |9j 

It is worth noting that the optimality of p — 1/n apparently was never proven 
rigorously before, not even for the case of OneMax[§, where tight upper and lower 
bounds on the exp ected op t imization time were only available for the standard mu- 
tation probability ( Sudholt . 2010 ; Doerr. Fouz and Wittl . 2011 ). For the general case 
of linear functions, the strongest pr evious result said that p = 9 (1/n) is optimal 
(IDroste. Jansen and Wegener! . 120021 ). Our result on the optimality of the mutation 
probability 1/n is interesting since this is the commonly recommended choice by 
practitioners. 



4 Upper Bounds 

In this section, we show a general upper bound that applies to any non-trivial muta- 
tion probability. 

Theorem 4. On any linear function, the optimization time of the (1+1) EA with 
mutation probability < p < 1 is at most 

(1 p) i-nf na 2 {l-pf- n | a ln(l/p) + (n - 1) ln(l - p) + A = 
\ a — 1 a — 1 p J 

with probability at least 1 — e~ l , and it is at most b(l) in expectation, where a > 1 
can be chosen arbitrarily (also depending on n). 

Before we prove the theorem, we note two important consequences in more read- 
able form. The first one (Corollary [T]) displays upper bounds for mutation probabil- 
ities c/n. The second one (Corollary [2]) is used in Theorem [3] above, which states a 
phase transition from polynomial to superpolynomial expected optimization times at 
mutation probability p = 0((lnn)/n). 

Corollary 1. On any linear function, the optimization time of the (1+1) EA with 
mutation probability p = c/n, where c > is a constant, is bounded from above by 
(1 + o(l))((e c /c)nlnn) with probability 1 — o(l) and also in expectation. 



2 A recent technical report extending ISudholtl ( 2010T ) shows the optimality of p = 1/n in the case 



of OneMax using a different approach, see http://arxiv.org/abs/1109.1504. 
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Proof. Let a := In Inn or any other sufficiently slowly growing function. Then 
a/ (a — 1) = l + 0(l/lnlnn) and a 2 /(a — 1) = O(lnlnn). Moreover, (1 — cjn) x ~ n < e c . 
The b(t) in Theorem H] becomes at most 

c I Vv ii \ | / 1 , nu n(ln(n) + ln(l/c) + t) 
e ■ O(nlnlnn) + (1 + o(l))— — — 



c 

and the corollary follows by choosing, e.g., t := In Inn. □ 

Corollary 2. On any linear function, the optimization time of the (1+1) EA with 
mutation probability p = 0((lnn)/n) and p = fi(l/poly(n)) is polynomial with prob- 
ability 1 — o(l) and also in expectation. 

Proof. Let a := 2. By making all positive terms at least 1 and multiplying them, 
we obtain that the upper bound b(t) from Theorem H] is at most 

8n(l- P r^. He/p)+t < 8ne^. ln{e/p)+t . 

p p 

Assume 1/p = fi(poly(n)) and p < c(lnn)/n for some constant c and sufficiently 
large n. Then e 2pn < n 2c and the whole expression is polynomial for t = 1 (proving 
the expectation) and also if t = Inn (proving the probability 1 — o(l)). □ 

The proof of Theorem 141 uses an adaptive p otentia l function as in Doerr and 



Goldberg ( 20111 ). That is, the random variables X® used in Theorem [T] map the 
current search point of the (1+1) EA via a potential function to some value in a 
way that depends also on the linear function at hand. As a special case, if the given 
linear function happens to be OneMax, X® just counts the number of one-bits at 
time t. The general construction shares some similaritie s with the one in Doerr and 



Goldberg (120 111 ), but both construction and proof are less involved. 



Proof of Theorem HI Let f(x) = w n x n + • • • + W\Xi be the linear function at hand. 
Define 

ft i a P 



for 1 < i < n, and let g(x) = g n x n + • • • + g\X\ be the potential function defined by 
gt := 1 = 7i and 



gi := mm 7,, 



Wi-i 

for 2 < i < n. Note that the are non- decreasing w. r. t. i. Intuitively, if the ratio of 
Wi and is too extreme, the minimum function caps it appropriately, otherwise 
gi and g i _ 1 are in the same ratio. We consider the stochastic process 
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where a® is the current search point of the (1+1) EA at time t. Obviously, = 
if and only if / has been optimized. 

Let A t := X® - X(' +1 ). We will show below that 

E(A t \X^ = s) > s-p-il-pr- 1 -^-^. (*) 

The initial value satisfies 

" ( 1 + _ 1 e nap(l-p)^ 

< *+...+* < g 7 - < v a ; (1 i p) L < a?(1 _ ?)1 -. . 

which means 



n— 1\ 



ln(X (0) ) < nap(l - pf~ n + ln(l/p) + ln((l - p) 

The multiplicative drift theorem (Theorem [1]) yields that the optimization time T is 
bounded from above by 

ln(X )+t < a (napjl - p) 1 '" + ln(l/p) + ln((l - p)"' 1 ) + t) 



p(l -p)"- 1 ^ - l/a) ~ (a - l)p(l -p)™" 1 

with probability at least 1 — e _i , and E(T) = 6(1), which proves the theorem. 

To show Q, we fix an arbitrary current value s and an arbitrary search point a® 
satisfying g(a^) — s . In the following, we implicitly assume X® = s but mostly 
omit this for the sake of readability. We denote by I := {i \ a® = 1} the index set 
of the one-bits in and by Z := {1, . . . ,n} \ I the zero-bits. We assume 1^0 
since there is nothing to show otherwise. Denote by a' the random (not necessarily 
accepted) offspring produced by the (1+1) EA when mutating a® and by a^ t+l ^ the 
next search point after selection. Recall that aS t+l > = a' if and only if f(a') < f(a^'). 
In the following, we will use the event A that a^ +1 ^ = a' ^ a® since obviously A t = 
otherwise. Let I* := {i G / | a\ = 0} be the random set of flipped one-bits and 
Z* := {i G Z | a't = 1} be the set of flipped zero-bits in a' (not conditioned on A). 
Note that I* ^ if A occurs. 

We need further definitions to analyze the drift carefully. For i G /, we define 
k(i) := max{j < % \ gj = 7j} as the most significant position to the right of i 
(possibly i itself) where the potential function might be capping; note that k(i) > 1 
since g\ = 71. Let L(i) := {k(i),...,n} D Z be the set of zero-bits left of (and 
including) k(i) and let R(i) := {1, . . . , k(i) — 1} fl Z be the remaining zero-bits. Both 
sets may be empty. For event A to occur, it is necessary that there is some % G / 
such that bit i flips to zero and 

J2 w j- J2 w 3 - 

jei* jez*r\L(i) 
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since we are taking only zero-bits out of consideration. Now, for i E I, let A{ be the 
event that 

1. z is the leftmost flipping one-bit (i. e., i & I* and {i + 1, . . . , n} fl /* = 0) and 

2 - £ ie /* Wj - E ie Z*OL(i) W 3 > 0- 

If none of the Aj occurs, A f = 0. Furthermore, the Ai are mutually disjoint. 
For any i e I, A t can be written as the sum of the two terms 

A l(i) := # and Ai? ^) := ~ 

jeJ* jez*ni(i) jez*n.R(i) 

By the law of total probability and the linearity of expectation, we have 

E{A t ) = E ( A di) I Ai) ■ Prob(A) + E(A R (i) \ A t ) • Prob(^). (**) 

In the following, the bits in R(i) are pessimistically assumed to flip to 1 independently 
with probability p each if Ai happens. This leads to E(Ar(i) \ Ai) > —pJ2jeR(i) 9r 
In order to estimate E(Ai{i)), we carefully inspect the relation between the 
weights of the original function and the potential function. By definition, we ob- 
tain gj/g k (i) = Wj/wkii) for k(i) < j < i and gj/g k (i) < Wj/w k ^ for j > i whereas 
9j/9h(i) > Wj/w k{i) for j < k(i). Hence, if A { occurs then g 6 > g k{i) ■ -^f- for j E I* 

k(i) 

(since i is the leftmost flipping one-bit) whereas gj < g k ^ ■ for j e L(i). Together, 
we obtain under A(i) the nonnegativity of the random variable A^(i): 

A L (i) | Ai = Yl 9j ~ Yl to 

jei*\Ai je{z*nL(i))\A z 

9k{i) — — - 2^ 9k ^ — - 

jer»\Ai Wk{i) je(z*nL(i))\Ai Wk{i) 

using the definition of A{. 

Now let 5*, := {\Z* fl L(i)\ = 0} be the event that no zero-bit from L(i) flips. 
Using the law of total probability, we obtain that 

E(A L (i) | A,) ■ Prob(A) = E(A L (i) \ A t n $) ■ Prob(A, n Si) 

+ E{A L {i) | Ai n S-) ■ Prob(A 4 n si). 

Since Ai(i)\Ai > 0, the conditional expectations are non-negative. We bound the 
second term on the right-hand side by 0. In conjunction with (l**f . we get 

E(A t ) > ^£(A L (z) | Ai n Si) ■ Prob(4 n S { ) + E(A R (i) \ A { ) ■ Prob(^). 

iei 
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Obviously, E(A L (i) \ A i nS l ) > g { . We estimate Piob(A l nS l ) > p(l-p) n_1 since 
it is sufficient to flip only bit % and Prob(Aj) < p since it is necessary to flip this bit. 
Further above, we have bounded E(Ar(i) \ Ai). Taking everything together, we get 

E(A t ) > [p^-pY-^i-P 2 E 9j 

iei \ jeR(i) 

> ^(p^-pr-'^m-p 2 

The term for i equals 

k{i)~l 



m-1 9i I , OtP 



9k(i)\ (l-p)' 11 - 1 



ap 



(1-P) 



n-1 



fc(t)-l 



where the inequality uses ^ > Hence, 

which proves and, therefore, the theorem. □ 



5 Refined Upper Bound for Mutation Probability 

1/n 

In this section, we consider the standard mutation probability p = 1/n and refine 
the result from Corollary [TJ More precisely, we obtain that the lower order-terms are 
0(n). The proof will be shorter and uses a simpler potential function. 

Theorem 5. On any linear function, the expected optimization time of the (1+1) EA 
with p = 1/n is at most enlnn + 2en + 0(l) , and the probability that the optimization 
time exceeds en Inn + (1 + t)en + 0(1) is at most e~*. 

Proof. Let f(x) = w n x n + ■ ■ • + w±x± be the linear function at hand and let g(x) = 
9n x n + ■ • • + 9\X\ be the potential function defined by 




minjj < i \ w j =Wi } — 1 



11 



hence gi = (1 + l/(n — 1))* 1 for all % if and only if the Wi are mutually distinct. We 
consider the stochastic process g{a {t) ), where a (i) is the current search point 

of the (1+1) EA at time t. Obviously, X® = if and only if / has been optimized. 
Let A t : = X^ — X^ t+1 \ In a case analysis f partly inspired by Doerr. Johannsen 



and Winzen, l2010bl ). we will show below for n > 4 that E(A t \ X® = s) > s/(en) 



The initial value satisfies 

x W < g „ + ... +91 <g( 1+ ^y^ (i±Mpi^i 

n-lj 1/(^-1) 
< (n - 1) ( H — J e - (n - 1) < en, 

V n-iy 

where we have used (1 + l/(n — l))™ -1 < e. Hence, ln(X ) < (Inn) + 1. Assuming 
n > 4, Theorem [T] yields i?(T) < en(ln(n) + 2) and Prob(T > en((lnn) + £ + 1)) < e~* 
regardless of the starting point, from which the theorem follows. 

The case analysis fixes an arbitrary current search point a®. We denote by 
I :— {i | af' = 1} the index set of its one-bits and by Z : = {1, . . . ,n}\I its zero-bits. 
We assume 7^0 since there is nothing to show otherwise. Denote by a' the random 
(not necessarily accepted) offspring produced by the (1+1) EA when mutating 
and by a^ +1 ^ the next search point after selection. Recall that a^ +1 ^ = a' if and 
only if f(a') < f(a^). In what follows, we will often condition on the event A that 
a (* +1 ) = a' a® holds since A t = otherwise. Let I* := {i G I | a- = 0} be the set 
of flipped one-bits and by Z* := {i G Z \ a[ = 1} be the set of flipped zero-bits. Note 
that I* 7^ if A occurs. 

Case 1: Event Si := {|/*| > 2} fl A occurs. Under this condition, each zero-bit 
in a^' has been flipped to 1 in a^ +1 ^ with probability at most 1/n. Since g« > 1 for 
1 < i < n, we have 



1 1 - 

E(A t \S 1 ) > > 2--V 



1 



n * — ' n ~^ \ n 



n/(n — 1) \ \ n 

for n > 4, where we have used 1 + l/(n — 1) = 1/(1 — 1/n). Hence, we pessimistically 
assume E(A t \ Si) = 0. 

Case 2: Event S 2 := {\I*\ = 1} D A occurs. Let i* be single element of I* and 
note that this is a random variable. 

Subcase 2.1: S 2 i := {\I*\ = 1} n {Z* = 0} n A occurs. Since {|/*| = 1} and 
{Z* = 0} together imply A, the index i* of the flipped one-bit is uniform over /. 
Hence, E(A t | S 21 ) = £ i6J &/|/|. Moreover, Prob(S 21 ) > |/|(l/n)(l - 1/n)"- 1 > 
\I\/(en), implying E(A t \ S 21 ) ■ Prob^i) > g{a®)/(en) = X®/{en). If we can show 
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that E(A t | {|/*| = 1} n {\Z*\ > 1} n A) > 0, which will be proven in Subcase 2.2 
below, then E(A t \ X® — s) > s/(en) follows by the law of total probability and the 
proof is complete. 

Subcase 2.2: S 22 ■= {\I*\ = 1} H {\Z*\ > 1} n A occurs. Let j* := max{j | j E 
Z*} be the index of the leftmost flipping zero-bit, and note that also j* is random. 
Since we work under |/*| = 1 and the Wj are monotone increasing w. r. t. j, it is 
necessary for A to occur that Wj* < Wi* holds. 

Subcase 2.2.1: S 221 ■= {\I*\ = 1} n {\Z*\ > 1} n {j* > i*} (~) A occurs. Then 
Wj* = Wi* and \Z*\ = 1 must hold. In this case, g,y* = g, L * by the definition of g and 
E(A t | 5*221) = follows immediately. 

Subcase 2.2.2: S 222 := {|/*| = 1} n {\Z*\ > 1} n {j* < i*} D A occurs. If 
Wj* = w^ then \Z*\ = 1 must hold for A to occur, and zero drift follows as in the 
previous subcase. Now let us assume Wj* < Wi* and thus gj* < gi*. For notational 
convenience, we redefine i* := min{i | Wi = w^}. We consider Z r :— Z D {1, . . . , i* — 
1}, the set of potentially flipping zero- bits right of i*, denote k := \Z r \ and note 
that in the worst case, Z r = {i* — 1, . . . , i* — k} as the g^ are non- decreasing. By 
using p := Prob(Z* fl Z r 7^ 0) = 1 — (1 — l/n) k and the definition of conditional 
probabilities, we obtain under S222 that every bit from Z r is flipped (not necessarily 
independently) with probability at most (l/n)/p— n ( 1 _( 1 i 1 / TO )fc) • We now assume that 
all the corresponding a' are accepted. This is pessimistic for the following reasons: 
Consider a rejected a'. If \Z*\ = 1 then our prerequisite j* < i* and the monotonicity 
of the gi imply a negative A t -value. If \Z*\ > 1 then the negative A t -value is due to 
the fact gi < gi-\ + gi- 2 for 3 < i < n. Hence, using the linearity of expectation we 
get 



1 



k 



E(A t I S222) > gi* 



9i*-j 



np 



jez r 3=1 



n{\ - (1 - l/n) k ) 




3=0 




where the last equality follows since 1 + l/(n— 1) = (1 — l/n) 



1 and 




This completes the proof. 



□ 
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6 Lower Bounds 



In this section, we state lower bounds that prove the results from Theorem H] to be 
tight up to lower-order terms for a wide range of mutation probabilities. Moreover, we 
show that the lower bounds hold for the very large class of mutation-based algorithms 
(Algorithm [2]). Recall that a list of the most important consequences is given above 
in Theorem [3j For technical reasons, we split the proof of the lower bounds into two 
main cases, namely p = 0{rT 2 ^^ e ) and p = f2(n e_1 ) for any constant e > 0. Unless 
p > 1/2, the proofs go back to OneMax as a worst outlined in the following; 

subsection. 



6.1 OneMax as Easiest Linear Function 



Doerr, Johannsen and Winzen ( I2010al ) show with respect to the (1+1) EA with 
standard mutation probability 1/n that OneMax is the "easiest" function from the 
class of functions with unique global optimum, which comprises the class of linear 
functions. More precisely, the expected optimization time on OneMax is proved to 
be smallest within the class. 

We will generalize this result to p < 1/2 with moderate additional effort. In 
fact, we will relate the behavior of an arbitrary m utation-based EA on OneMax to 



the (1+1) EA M in a similar way to ISudholtl ( 120101 . Section 7). The latter algorithm 



displayed as Algorithm |3l creates search points uniformly at random from time to 
time fi — 1 and then chooses a best one from these to be the current search point at 
time /i — 1; afterwards it works as the standard (1+1) EA. Note that we obtain the 
standard (1+1) EA for \i — 1. Moreover, we will only consider the case [i = poly(n) 
in order to bound the running time of the initialization. This makes sense since a 
unique optimum (such as the all-zeros string for OneMax) is with overwhelming 
probability not found even when drawing 2^™ random search points. 



Algorithm 3 (1+1) EA M 
for t :— — >• ii — 1 do 

choose Xt G {0, 1}™ uniformly at random, 
end for 

x t := argmin{/(x) | x G {xo, ■ ■ ■ ,Xt}} (breaking ties uniformly), 
repeat 

create x' by flipping each bit in x t independently with prob. p. 
x t +i := x' if f(x') < f(xt), and x t +i '■— x t otherwise. 
t:=t + l. 
until forever. 
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Our analyses need th e monotonicity statement from Lemma Q] below, which is 



similar to Lemma 11 in iDoerr. Jo hanns en and Winzenl (2010al ) and whose proof 



is already sketched in iDroste. Jansen. and Wegener! (120001 . Section 5). Note, how- 
ever, that Doerr. Johannsen and Winzenl ( 2010a ) only consider p = l/n and have a 



stronger statement for this case. More precisely, they show Prob(| mut(a)\ 1 = j) > 
Prob(\mut(b)\ 1 = j), which does not hold for large p. Here and hereinafter, \x\ 1 
denotes the number of ones in a bit string x. 

Lemma 1. Let a,b G {0,1}™ be two search points satisfying \a\ 1 < |&| r Denote 
by mut(x) the random string obtained by mutating each bit of x independently with 
probability p. Let < j < n be arbitrary. If p < 1/2 then 

Prob(| mut(a) \ 1 < j) > PTob(\mut(b)\ 1 < j). 

Proof. We prove the result only for \b\ 1 = \a\ 1 + 1. The general statement then 
follows by induction on \b\ x — \a\ v 

By the symmetry of the mutation operator, PTob(\mut(x)\ 1 < j) is the same 
for all x with \x\ 1 = \a\ 1 . We therefore assume b > a (i.e., b is component-wise 
not less than a). In the following, let s* be the unique index where b s * = 1 and 
a s * = 0. Let S(x) be the event that bit s* flips when x is mutated. Since bits are 
flipped independently, it holds Prob(5'(x)) = p for any x. We write a' := mut(a) and 
b' := mut(b). Assuming p < 1/2, the aim is to show Probda'^ < j) > Probdfe'^ < j), 
which by the law of total probability is equivalent to 



Probda'^ < j | S(a)) - Prob(|6'| 1 < j | S(b)) j ■ (1 - p) 
+ (Vrobda^ < j | 5(a)) - Probd^ < j | S{b))j ■ p > 0. 



(*) 



Note that the relation Probda'^ < j \ S(a)) > Probdo'^ < j \ S(b)) follows from 
a simple coupling argument as a' < b' holds if the mutation operator flips the bits 
other than s* in the same way with respect to a and 6. Moreover, 



Probda'^ < j | S(a)) - Prob(|6'| 1 < j \ S(b)) 

= Probdo^ < j | 5(6)) - Probda'^ < j | S(a)) 

since a is obtained from b by flipping bit s* and vice versa. Hence, ((*]) follows. □ 

The foll owing theorem is a generalization of Theorem 9 bv Doerr. Johannsen 



and Winzen (l2010al ) to the case p < 1/2 instead of p = l/n. However, we not 
only generalize to higher mutation probabilities, but also also consider the more 
general class of mutation-based algorithms. Finally, we prove stochastic ordering, 
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while iDoerr. Johannsen and Winzenl ( 12010al ) inspect only the expected optimization 
times. Still, many ideas of the original pro of can be taken over and be combined with 
the proof of Theorem 5 in ISudholtl (120101 ) . 



Theorem 6. Consider a mutation-based EA A with population size /x and mutation 
probability p < 1/2 on any function with a unique global optimum. Then the opti- 
mization time of A is stochastically at least as large as the optimization time of the 
(1+1) EA^ on OneMax. 

Proof. Let / denote the function with unique global optimum, which we w. 1. o. g. 
assume to be the all-zeros string. For any sequence X = (x , . . . , a^_i) of search 
points over {0, l} n , let q{X) be the probability that X represents the first £ search 
points xq, . . . , xe-i created by Algorithm Aon f (its so-called history up to time £—1). 
For any history X with q(X) > 0, let Tf(X) denote the random optimization time of 
Algorithm A on f, given that its history up to time £ equals X . Let 



denote the set of all possible histories of length £ with respect to Algorithm A on /, 
and let 5 := {IJ^Li ^ | m 6 IN} denote all possible histories of finite length. Finally, 
for any X G S, let L(X) denote the length of X. 

Given any X G S, let (1+1) EA(X) be the algorithm that chooses a search point 
with minimal number of ones from X as current search point at time L{X) — 1 (break- 
ing ties uniformly) and afterwards proceeds as the standard (1+1) EA on OneMax. 
Now, let T 0neMax (X) denote the random optimization time of the (1+1) EA(X). We 
claim that the stochastic ordering 



holds for every X G H satisfying L(X) > /x and every t > 0. Note that the random 
vector of initial search points X* := (xo, . . . ,x^_i) follows the same distribution in 
both Algorithm A and the (1+1) EA^. In particular, the two algorithms are identical 
before time /x — 1, i.e., before initialization is finished. Furthermore, (1+1) EA(X*) 
is the (1+1) EA M initialized with X*. Altogether, the claimed stochastic ordering 
implies the theorem. Moreover, regardless of the length L(X), the claim is obvious 
for t < L(X) since the behavior up to time L(X) is fixed. 

For any X G S, let := minjlx^ | x G X} denote the best number of ones in 
the history, where x G (x , . . . , a^-i) means that x = Xi for some i G {0, ...,£ — 1}. 
For every k G {0, . . . , n}, every £ > /x and every t > 0, let 



X = (x ,...,x l - 1 ) G X{0,l} n q(X)>0 



i=i 



Prob(7)(#) > t) > Prob(T ONEM AxW > t) 



Pk,e(t) 



min{Prob(T ONEM Ax W > I + t) | X G E £ , \X 



i 



k} 
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be the minimum probability of the (1+1) EA(X) needing at least £ + t steps to 
optimize OneMax from a history of length £ whose best search point has exactly 
k one-bits. Due to the symmetry of the OneMax function and the definition of 
(1 + 1) EA(X), we have Prob(T ONE MAx(^ : ') > £ + 1) = Pk,e(t) for every X satisfying 
\X\ = £ and \X\ 1 — k. In other words, the minimum can be omitted from the 
definition of Pk t e- 

Furthermore, for every k G {0, . . . , n}, every £ > p and every t > 0, let 

p ki e(t) := min{Prob(7)(#) > £ + t) \ X G E e , \X\ 1 > k} 

be the minimum probability of Algorithm A needing at least I + 1 steps to optimize / 
from a history of length £ > p whose best search point has at least k one-bits. We 
will show pk,e(t) > Pk,e(t) for any k G {0, . . . ,n} and £ > p by induction on t. In 
particular, by choosing £ := p and applying the law of total probability with respect 
to the outcomes of \X*\ V this will imply the above-mentioned stochastic ordering 
and, therefore, the theorem. 

If k > 1 then Pk,e{0) = pk,e{fy = 1 for any £ > p since the condition means that 
the first £ search points do not contain the optimum. Moreover, po,e(t) = Po,e(t) = 
for any t > and I > p since a history beginning with the all-zeros string corre- 
sponds to optimization time and thus minimizes both Piob(Tf(X) > t + £) and 
Prob(To NE MAx('^') > t + £). Now let us assume that there is some t > such that 
Pk,i{t') > Pk,i{t') holds for all < t' < t, k e {0, ...,n}, and £ > p. Note that the 
inequality has already been proven for all t if k = 0. 

Consider the (1+1) EA(X) for an arbitrary X satisfying L(X) — £ > p and 
\X\ 1 — k + 1 for some k e {0, . . . , n — 1}. Let some x G {0, l} n , where \x\ 1 — k + 1, be 
chosen from X and let y G {0, l} n be the random search point generated by flipping 
each bit in x independently with probability p. The (1 + 1) EA{X) will accept y as 
new search point at time £ + 1 > p if and only if \y\ ± < \x\ 1 — k + 1. Hence, 

k 

Pk+iA* + 1 ) = Prob (Mi >k + l)- p k+ i,e+i(t) + Prob (bli = j) ■ Pj\m-i(*)- (*) 

3=0 

Next, let X, where again L(X) = £ > p, be a history satisfying Prob(Tf(X) > 
t + 1) = ftk+i,i{t + 1) an d let x be the (random) search point that is chosen for 
mutation at time £ in order to obtain the equality of the two probabilities. Note that 
\x\ 1 > k + 1. Moreover, let y G {0,1}™ be the random search point generated by 
flipping each bit in x independently with probability p. Let X' be the concatenation 
of X and y. Then 

Pk+iA* + !) = Prob (l^li >* + !)• Prob(T f (X') > t | \y\ ± >k + l) 

k 

+ ^ Probdyl! = j) ■ Prob(T f (X') >t\\y\ l= j), 

3=0 
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which, by definition of the Pi(t), gives us the lower bound 



Pk+lA* + 1 ) > Prob (|yli > k + 1) • Pfc+V+iW + ^2 Prob (|yli = i) • Pj,t+i(t)- 



To relate the last inequality to above, we interpret the right-hand side as a 

k- ' 



function of + 2 variables. More precisely, let 4>{ a Qi ■ ■ ■ > := X^=o a jPj,t+i(t) an d 



consider the vectors 

u<fl = (^ /} , . . . , 4i) := (Prob(|y| x = 0), . . . , Probd^ = k), ProbQy^ > k + 1)) 
and 

= (^ 0) , . . . , v { k ° + \) := (Probdi/I, = 0), . . . , Probd^ = k), Probd^ >k+ 1)). 
If we can show that (f)(v^) > (p(v^), then we can conclude 

k 

> Piob(\y\ 1 >k + l)'Ph+ij+i(t) + ^Piob(\y\ 1 =j)-pjj + i(t) = p k+1/ {t + l), 

3=0 

where the last inequality follows from the induction hypothesis and the equality is 
from (J*|. This will complete the induction step. 

To show the outstanding inequality, we use that for < j < k 

Probdyii <j) > Probdyl! <j), 

which follows from Lemma [1] since [x^ > \x\ 1 and p < 1/2. In other words, 

i>. (o) s x>. w 

for < j < k and Yli=o v i = Yli=o v \ since we are dealing with probability 
distributions. Altogether, the vector i>'°) majorizes the vector Since they are 

based on increasingly restrictive conditions, the pj(t) a re non- decreasing in j. Hence , 
4> is Schur- concave (cf. Theorem A. 3 in Chapter 3 of Marshall. Olkin. and Arnold! 



20111 ). which proves <f){v^') > <f>(v ^) as desired. □ 
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6.2 Large Mutation Probabilities 



It is not too difficult to show that mutation probabilities p = fi(n e_1 ), where e > is 
an arbitrary constant, make the (1+1) EA (and also the (1+1) EA M ) flip too many 
bits for it to optimize linear functions efficiently. 

Theorem 7. On any linear function, the optimization time of an arbitrary mutation- 
based EA with fi = poly(n) and p = fi(n e_1 ) for some constant e > 0, is bounded 
from below by 2 a ( n > with probability 1 — 2~ n ^ nE \ 

Proof. Due to Theorem El if suffices to show the result for the (1+1) EA M on 
OneMax. The following two statements follow from Chernoff bounds (and a union 
bound over p, = poly(n) search points in the second statement). 

1. Due to the lower bound on p, the probability of a single step not flipping at 
least [pi/2\ bits out of a set of i bits is at most 2~ n W = 2-^ n£_1 ). 

2. The search point has at least n/3 and at most 2n/3 one-bits with proba- 
bility 1 - 2- Q(n l 

Furthermore, as we consider OneMax, the number of one-bits is non-increasing over 
time. We assume an being non-optimal and having at most 2n/3 one-bits, 
which contributes a term of only 2~ n ^ to the failure probability. The assumption 
means that all future search points accepted by the (1+1) EA M will have at least n/3 
zero-bits. In order to reach the optimum, none of these is allowed to flip. As argued 
above, the probability of this happening is 2~ n ( n£ \ and by the union bound, the total 
probability is still 2 _f ^ n£ ) in a number of 2 cn£ steps if the constant c is chosen small 
enough. □ 

Mutation-based EAs have only been defined for p < 1/2 since flipping bits with 
higher probability seems to contradict the idea of a mutation. However, for the sake 
of completeness, we also analyze the (1+1) EA with p > 1/2 and obtain exponential 
expected optimization times. Note that we do not know whether OneMax is the 
easiest linear function in this case. 

Theorem 8. On any linear function, the expected optimization time of the (1+1) EA 
with mutation probability p > 1/2 is bounded from below by 2^ n > . 

Proof. We distinguish between two cases. 

Case 1: p > 3/4. Here we assume that the initial search point has at least 
n/2 leading zeros and is not optimal, the probability of which is at least 2~ ra / 2 ~ 1 . 
Since the n/2 most significant bits are set correctly in this search point, all accepted 
search points must have at least n/2 zeros as well. To create the optimum, it is 
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necessary that none of these flips. This occurs only with probability at most (l/4) n//2 , 

hence the expected optimization time under the assumed initialization is at least 4™/ 2 . 

Altogether, the unconditional expected optimization time is at least 2~ n / 2_1 • 4 n / 2 = 
2^2(71) 

Case 2: 1/2 < p < 3/4. Now the aim is to show that all created search points have 
a number of ones that is in the interval / := [n/8, 7n/8] with probability 1 — 2~ n<yn \ 
This will imply the theorem by the usual waiting time argument. 

Let 1 be a search point such that \x\ 1 e /. We consider the event of mutating x 
to some x' where \x'\ 1 < n/8. Since p > 1/2, this is most likely if \x\ 1 = 7n/8 (using 
the ideas behind Lemma [T] for the complement of x). Still, using Chernoff bounds 
and p < 3/4, at least (1/5) • (7n/8) > n/8 one-bits are not flipped with probability 
1 — 2~ n ( n \ By a symmetrical argument, the probability is 2~ n ^ that \x'\ 1 > In/ 8. 

□ 

As was to be expected, no polynomial expected optimization times were possible 
for the range of p considered in this subsection. 

6.3 Small Mutation Probabilities 

We now turn to mutation probabilities that are bounded from above by roughly 
1/n 2 / 3 . Here relatively precise lower bounds can be obtained. 

Theorem 9. On any linear function, the expected optimization time of an arbitrary 
mutation-based EA with p = poly(n) and p = 0(n~ 2 / 3 ~ £ ) is bounded from below by 

(l-o(l))(l-p)~ n (l/p) min{lnn,ln(l/(p 3 n 2 ))}. 

As a consequence from Theorem we obtain that the bound from Theorem H] 
is tight (up to lower-order terms) for the (1+1) EA as long as ln(l/(p 3 n 2 )) = 
Inn — o(lnn). This condition is weaker than p — 0((hxn)/n). If p = u((lnn)/n) 
or p = o(l/poly(n)), then Theorem [9] in conjunction with Theorem [7] and [8] imply 
superpolynomial expected optimization time. Thus, the bounds are tight for all p 
that allow polynomial optimization times. 

Before the proof, we state another important consequence, implying the statement 
from Theorem [3] that using the (1+1) EA with mutation probability 1/n is optimal 
for any linear function. 

Corollary 3. On any linear function, the expected optimization time of a mutation- 
based EA with /1 = poly(n) and p = c/n, where c > is a constant, is bounded 
from below by (1 — o(l))((e c /c)n In n) . If p = u(l/n) or p = o(l/n), the expected 
optimization time is u(nlnn). 
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Proof. The first statement follows immediately from Theorem [9] using (1 — c/n)~ n > 
e c and ln(l/(p 3 n 2 )) = Inn — O(lnc). The second one follows, depending on p, either 
from Theorem [7] or, in that case assuming p = O ((In n)/n), from Theorem [9j noting 
that (1 — p)~ n (l/p) > e np /p = u(n) if p = u(l/n) or p = o(l/n). □ 

Recall that by Theorem El it is enough to prove Theorem for the (1+1) EA M 
on OneMax. As mentioned above, this is a well-studied function, for which strong 
upper and lower bounds are known in the case y = 1/n. Our result for general p is 



inspired by the proof of Theorem 1 in iDoerr. Fouz and Wittl (120101 ). which uses an 



implicit multiplicative drift theorem for lower bounds. Therefore, we now need an 
upper bound o n the multiplicative drif t , whic h is given by the following generalization 



of Lemma 6 in IDoerr. Fouz and Wittl ( 1201 ll ) 



Lemma 2. Consider the (1+1) EA with mutation probability p for the minimization 
of OneMax. Given a current search point with i one-bits, let I' denote the random 
number of one-bits in the subsequent search point (after selection). Then we have 
E[i - I'] < ip(l -p + ip 2 /(l - p)) n -\ 

Proof. Note that V <i since the number of one-bits in the process is non-increasing. 
Hence, only mutations that flip at least as many one-bits as zero-bits have to be 
considered. The event that the total number of one-bits is decreased by k > can 
be partitioned into the subevents F^j that k + j one-bits and j zero-bits flip, for all 
j G Zq . The probability of an individual event Fkj equals 



where (f) := for b > a. Thus, we have 



i / ■ \ n—i 
i — 1 V / a— n 



n—k—2j 



n—i / -\ / \ 2i 

' n — i\ I p x 



fc=l v ' j=0 



J J V 1 ~P 



-Si =-.S 2 

where the second inequality uses ( fc ^) < V ■ Q). Factoring out (1 — p) n ~ l of Si, 
we recognize the expected value of a binomial distribution with parameters i and p, 
which means Si = (1 —p) n ~ l ■ %p. Regarding 5*2, we apply the Binomial Theorem and 
obtain S 2 = (1 + i(p/(l — p)) 2 ) n ~ l . The product of Si and S 2 is the upper bound 
from the lemma. □ 
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Proof of Theorem [9], As already mentioned, we may assume that the linear function 
is OneMax and that the algorithm is the (1+1) EA M . The idea is to apply Theorem^ 
which is the above-mentioned multiplicative drift theorem for lower bounds, for a 
suitable choice of the parameters. Let p := ma,x{p, 1/n}. We first observe that the 
probability of flipping at least b := pnlnn bits in a single step is bounded from above 
by 

n j . pjmlnn < ( e P n \ _ 2~^(pn(ln n)(ln In n.)) 

K pn Inn J ~ \pn Inn J 

where we have used p < p. Hence, the probability is superpolynomially small. In the 
following, we assume that the number of one-bits changes by at most b in each of a 
total number of at most (1 — p)~ n n\nn = 2°(P n ) +0 ( lnlnn ) steps that are considered for 
the lower bound we want to prove. This event holds with probability 1 — o(l), which, 
using the law of total probability, decreases the bound only by a factor of 1 — o(l). 

Let X® denote the number of one-bits at time t and note that this is non- 
increasing over time. We choose s m i n := npln n and (3 := 1/lnn and introduce 
s max := l/{2p 2 n In n) as an additional upper bound. Note that s max < n/(21nn) due 
to p > 1/n. Since the fi initial search points are drawn uniformly at random and 
fi = poly(n), it holds X^ > s max with probability 1 — o(l). Again, assuming this to 
happen, we lose a factor 1 — o(l) in the bound we want to prove. Moreover, due to our 
assumption p = 0(r?T 2 / 3_e ) (which means p = 0(n~ 2 / 3 ~ £ )), we have b = nplnn < 
l/(4p 2 nlnn) = s max /2 for n large enough. Altogether, it holds s max /2 < X t * < s max 
at the first point of time t* where X t * < s max . To simplify issues, we consider 
the process only from time t* on. Skipping the first t* steps, we pessimistically 
assume s := s max /2 as starting point and X® < s max for all t > 0. The second 
condition of the drift theorem is now fulfilled since the bound on p also implies 
b = pnlnn < l/(2p 2 n\n 2 n) = /3s max , where /9s max is the largest value for (5 s to be 
taken into account. 

Assembling the factors from the lower bound in Theorem [2], we get = 1 — o(l). 

Furthermore, we have ln(so/s m i n ) = ln(l/(4p 3 n 2 ln 3 n)) = ln(l/(p 3 n 2 )) — 0(\nlnn), 
which is (1 — o(l)) ln(l/(p 3 n 2 )) by our assumption on p. If we can prove that 1/5 = 
(1 — o(l))(l — p)~ n (l/p), the proof is complete. 

To bound 5, we use Lemma [2J Note that i < s max holds in our simplified process. 
Using the lemma and recalling that 1/p < 1/p, we get 

E(X® - X^) | =i) f ^ Smax p 2xn " Smax 

; < p l-p+- 

% \ 1 — p 



J \ !)-.-„ 



<P[1-P+— < P (1-J?) 1 



n In n / \ \ n In n t 

= (l + o(l))p(l-p) n , 
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where we have used p < 1/2 and (1 + 2/(nlnn)) n = 1 + o(l) and (1 — p) Smax = 
(1 _p)-i/(2p 2 «inn) = x + ^-q Hence; ^ > (1 - (l)) (1/p) (1 - p)" n as suggested, 

which completes the proof. □ 
Finally, we remark that the expected optimizatio n time of the (1+1) EA wit h 



p — 1/n on OneMax is known to be en Inn — G(n) (IDoerr. Fouz and Wittl . 120111 ). 
Hence, in conjunction with Theorems and El we obtain for p = 1/n that the expected 
optimization time of the (1+1) EA varies by at most an additive term 0(n) within 
the class of linear functions. 



Conclusions 

We have presented new bounds on the expected optimization time of the (1+1) EA 
on the class of linear functions. The results are now tight up to lower-order terms, 
which applies to any mutation probability p = 0((hxn)/n). This means that 1/n 
is the optimal mutation probability on any linear function. We have for the first 
time studied the case p = u(l/n) and proved a phase transition from polynomial 
to exponential running time in the regime 0((ln n)/n). The lower bounds show 
that OneMax is the easiest linear function for all p < 1/2, and they apply not 
only to the (1+1) EA but also to the large class of mutation-based EAs. They so 
exhibit the (1+1) EA as optimal mutation-based algorithm on linear functions. The 
upper bounds hold with high probability. As proof techniques, we have employed 
multiplicative drift in conjunction with adaptive potential functions. In the future, 
we hope to see these techniques applied to the analysis of other randomized search 
heuristics. 

We finish with an open problem. Even though our proofs of upper bounds would 
simplify for the function BinVal, this function is often considered as a worst case. Is 
it true that the runtime of the (1+1) EA on BinVal is stochastically largest within 
the class of linear functions, thereby complementing the result that the runtime on 
OneMax is stochastically smallest? 
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A Multiplicative Drift for Lower Bounds 



In this appendix, we supply the proof of Theorem [21 the lower-bound version of the 
multipli cative drift theorem. The proof follows the one of Theorem 5 in Lehre and 



Witt (120101 ) and uses the following additive drift theorem. 



Theorem 10 (Jagerskiipper ( 20071 )). Let X^\X^ 2 \... be random variables with 



bounded support and let T be the stopping time defined by T := min{t | + • ■ ■ + 
X {t) > g} for a given g > 0. If E(T) exists and E(X® | T > i) <u /or i e M, then 
E(T) > g/u. 

The proof of Theorem [2] also makes use of the following simple lemma. 

Lemma 3. Let X be any random variable, and k any real number. If it holds that 
Prob(X < k) > 0, then E{X) > E(X \ X < k). 

Proof. Define p := Prob(X < k) and Hk '■= E(X \ X < k). The lemma clearly holds 
when p = 1 such that we assume < p < 1 in the following. If E(X) is positive 
infinite then E(X) > ///. is obvious. If E(X) is negative infinite then so is /j,k by the 
law of total probability. Finally, for finite E(X), the law of total probability yields 

E{X) = (l-p)-E(X\X>k)+p-fx k > (l- p ).k+p-n h 
> (1-p) -Hk+P-Hk = E(X\X<k). 

□ 

Proof of Theorem l2l The proof generalizes the proof of Theor em 1 in Doerr. Fouz 



and Witt (120101 ) . The random variable T is non-negative. Hence, if the expectation 
of T does not exist, then it is positive infinite and the theorem holds. We condition 
on the event T > t, but we omit stating this event in the expectations for notational 
convenience. We define the stochastic process := ln(X^) (note that X® > 1), 
and apply Theorem [10] with respect to the random variables 

A t+1 (s) := (y<*> - Y^ I X« = s) = (in(^ry) I X® = s) . 

We consider the time until X® < s min if X^ = so and use the parameter g := 
ln(so/s m j n ). By the law of total probability, the expectation of A t+1 (s) can be ex- 
pressed as 

Prob(s - X (m) > f3s) ■ E(A t+1 (s) \ s - X {t+l) > /3s) 

+ Prob(s-X (m) < /3s) • E(A t+1 (s) \ s - X (m) < /3s). (1) 
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By applying the second condition from the theorem, the first term in (pQ) can be 



bounded from above by ^- • In s = 06. The logarithmic function is concave. Hence, 
by Jensen's inequality, the second term in ([1]) is at most 

ln ( £ (xfcrl s - x " +1,< ' 3sAX " , = s )) 

= ln ( 1 + E ( s ~ X ?X' 1 8 ~ *" + " <l3sA xt " = "))■ 

By using the inequality ln(l + x) < x as well as the conditions X t+ \ > (1 — 0)s 
and X t+1 < X t , this simplifies to 

E ( S ~ | a - < 0s A XW = ^ 

< £ ( S ~ Xi y | s - X« +1 ) < /3s A = ^ . 

V Q--p)s J 

By Lemma [3] and the first condition from the theorem, it follows that the second 
term in ([1]) is at most 

eI'-^Ix^s) < 6 



0--P)s ' J ~ 1-/3' 

Altogether, we obtain E(A t+1 (s)) < (0 + 1/(1 - 0))5 < ({0 + 1)/(1 - From 
Theorem [T0| it now follows that 



*e-i* m =«)*r£iK£) 



□ 
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